- Home
- Search Results
- Page 1 of 1
Search for: All records
-
Total Resources4
- Resource Type
-
0002200000000000
- More
- Availability
-
40
- Author / Contributor
- Filter by Author / Creator
-
-
Mudigere, Dheevatsa (4)
-
Ghobadi, Manya (2)
-
Jia, Zhihao (2)
-
Kewitsch, Anthony (2)
-
Khazraee, Moein (2)
-
Wang, Weiyang (2)
-
Zhang, Ying (2)
-
Zhong, Zhizhen (2)
-
Ding, Yufei (1)
-
Feng, Boyuan (1)
-
He, Xi (1)
-
Huang, Guyue (1)
-
Jahani, Majid (1)
-
Li, Ang (1)
-
Ma, Chenxin (1)
-
Mokhtari, Aryan (1)
-
Muthiah, Bharath (1)
-
Ribeiro, Alejandro (1)
-
Takac, Martin (1)
-
Wang, Yuke (1)
-
- Filter by Editor
-
-
null (1)
-
& Spizer, S. M. (0)
-
& . Spizer, S. (0)
-
& Ahn, J. (0)
-
& Bateiha, S. (0)
-
& Bosch, N. (0)
-
& Brennan K. (0)
-
& Brennan, K. (0)
-
& Chen, B. (0)
-
& Chen, Bodong (0)
-
& Drown, S. (0)
-
& Ferretti, F. (0)
-
& Higgins, A. (0)
-
& J. Peters (0)
-
& Kali, Y. (0)
-
& Ruiz-Arias, P.M. (0)
-
& S. Spitzer (0)
-
& Sahin. I. (0)
-
& Spitzer, S. (0)
-
& Spitzer, S.M. (0)
-
-
Have feedback or suggestions for a way to improve these results?
!
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
The deployment of Deep Learning Recommendation Models (DLRMs) involves the parallelization of extra-large embedding tables (EMTs) on multiple GPUs. Existing works overlook the input-dependent behavior of EMTs and parallelize them in a coarse-grained manner, resulting in unbalanced workload distribution and inter-GPU communication. To this end, we propose OPER, an algorithm-system co-design with OPtimality-guided Embedding table parallelization for large-scale Recommendation model training and inference. The core idea of OPER is to explore the connection between DLRM inputs and the efficiency of distributed EMTs, aiming to provide a near-optimal parallelization strategy for EMTs. Specifically, we conduct an in-depth analysis of various types of EMTs parallelism and propose a heuristic search algorithm to efficiently approximate an empirically near-optimal EMT parallelization. Furthermore, we implement a distributed shared memory-based system, which supports the lightweight but complex computation and communication pattern of fine-grained EMT parallelization, effectively converting theoretical improvements into real speedups. Extensive evaluation shows that OPER achieves 2.3× and 4.0× speedup on average in training and inference, respectively, over state-of-the-art DLRM frameworks.more » « less
-
Wang, Weiyang; Khazraee, Moein; Zhong, Zhizhen; Ghobadi, Manya; Jia, Zhihao; Mudigere, Dheevatsa; Zhang, Ying; Kewitsch, Anthony (, 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23))
-
Wang, Weiyang; Khazraee, Moein; Zhong, Zhizhen; Ghobadi, Manya; Jia, Zhihao; Mudigere, Dheevatsa; Zhang, Ying; Kewitsch, Anthony (, 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23))
-
Jahani, Majid; He, Xi; Ma, Chenxin; Mokhtari, Aryan; Mudigere, Dheevatsa; Ribeiro, Alejandro; Takac, Martin (, Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics)null (Ed.)
An official website of the United States government

Full Text Available